Mining hyperintervals Getting to grips with real-valued data

نویسنده

  • J. E. Witteveen
چکیده

Many uses of data mining, such as clustering, classification, the construction of decision trees, subgroup discovery and itemset mining, often fail to be able to cope with real-valued data well. In fact, it is common for data mining methods to only work well on nominal data with little different values. We build the theory to fill this gap for data from arbitrary uncountable sets and introduce an efficient method to mine data, without the usual discretization as a pre-processing step. It is shown that discretization is not needed in order to make use of the MDL principle.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

RealKrimp - Finding Hyperintervals that Compress with MDL for Real-Valued Data

The MDL Principle (induction by compression) is applied with meticulous effort in the Krimp algorithm for the problem of itemset mining, where one seeks exceptionally frequent patterns in a binary dataset. As is the case with many algorithms in data mining, Krimp is not designed to cope with real-valued data, and it is not able to handle such data natively. Inspired by Krimp’s success at using ...

متن کامل

(T) FUZZY INTEGRAL OF MULTI-DIMENSIONAL FUNCTION WITH RESPECT TO MULTI-VALUED MEASURE

Introducing more types of integrals will provide more choices todeal with various types of objectives and components in real problems. Firstly,in this paper, a (T) fuzzy integral, in which the integrand, the measure andthe integration result are all multi-valued, is presented with the introductionof T-norm and T-conorm. Then, some classical results of the integral areobtained based on the prope...

متن کامل

A New Algorithm for Optimization of Fuzzy Decision Tree in Data Mining

Decision-tree algorithms provide one of the most popular methodologies for symbolic knowledge acquisition. The resulting knowledge, a symbolic decision tree along with a simple inference mechanism, has been praised for comprehensibility. The most comprehensible decision trees have been designed for perfect symbolic data. Classical crisp decision trees (DT) are widely applied to classification t...

متن کامل

A Study of Improving the Performance of Mining Multi-Valued and Multi-Labeled Data

Nowadays data mining algorithms are successfully applying to analyze the real data in our life to provide useful suggestion. Since some available real data is multi-valued and multi-labeled, researchers have focused their attention on developing approaches to mine multi-valued and multilabeled data in recent years. Unfortunately, there are no algorithms can discretize multi-valued and multi-lab...

متن کامل

Pattern Discovery for Locating Motifs in Multivariate, Real-valued Time-series Data

The problem of locating motifs in multivariate, real-valued time series data concerns the discovery of sets of recurring patterns embedded in the time series. Each set is composed of several nonoverlapping subsequences and constitutes a motif because all of the subsequences are similar. This task is a natural extension of univariate motif discovery in both the symbolic and real-valued domains a...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012